Tracking and Far-Field Speech Recognition for Multiple Simultaneous Speakers

نویسندگان

  • Tobias Gehrig
  • John McDonough
چکیده

In prior work, we developed a speaker tracking system based on an extended Kalman filter using time delays of arrival (TDOAs) as acoustic features. While this system functioned well, its utility was limited to scenarios in which a single speaker was to be tracked. In this work, we remove this restriction by generalizing the IEKF, first to a probabilistic data association filter, which incorporates a clutter model for rejection of spurious acoustic events, and then to a joint probabilistic data association filter (JPDAF), which maintains a separate state vector for each active speaker. In a set of experiments conducted on seminar and meeting data, the JPDAF speaker tracking system reduced the multiple object tracking errror from 20.7% to 14.3% with respect to the IEKF system. In a set of automatic speech recognition experiments conducted on the output of a 64 channel microphone array which was beamformed using automatic speaker position estimates, applying the JPDAF tracking system reduced word error rate from 67.3% to 66.0%. Moreover, the word error rate on the beamformed output was 13.0% absolute lower than on a single channel of the array.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The sheffield wargames corpus

Recognition of speech in natural environments is a challenging task, even more so if this involves conversations between several speakers. Work on meeting recognition has addressed some of the significant challenges, mostly targeting formal, business style meetings where people are mostly in a static position in a room. Only limited data is available that contains high quality near and far fiel...

متن کامل

Tracking and beamforming for multiple simultaneous speakers with probabilistic data association filters

In prior work, we developed a speaker tracking system based on an extended Kalman filter using time delays of arrival (TDOAs) as acoustic features. While this system functioned well, its utility was limited to scenarios in which a single speaker was to be tracked. In this work, we remove this restriction by generalizing the IEKF, first to a probabilistic data association filter, which incorpora...

متن کامل

The Effects of Culture and Gender on the Recognition of Emotional Speech: Evidence from Persian Speakers Living in a Collectivist Society

This paper reports on a behavioral study that explores the role of culture and gender in the recognition of emotional speech in an under investigated cultural context (a collectivist society: i.e., Iran). Participants were asked to recognize the emotional prosody of a set of validated emotional vocal portrayals (including the five basic emotions). Findings of the experiment were then comp...

متن کامل

Online blind speech separation using multiple acoustic speaker tracking and time-frequency masking

Separating speech signals of multiple simultaneous talkers in a reverberant enclosure is known as the cocktail party problem. In real-time applications online solutions capable of separating the signals as they are observed are required in contrast to separating the signals offline after observation. Often a talker may move, which should also be considered by the separation system. This work pr...

متن کامل

Maximum Negentropy Beamforming

In this paper, we address an adaptive beamforming application based on the capture of far-field speech data from a single speaker in a real meeting room. After the position of a speaker is estimated by a speaker tracking system, we construct a subband-domain beamformer in generalized sidelobe canceller (GSC) configuration. In contrast to conventional practice, we then optimize the active weight...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006